make_stats helper functionmake_standings helper functionmake_sb helper functionYear column upTies column- Lng columnssb_df The National Football League (NFL) is a professional sports league for American football consisting of 32 teams, split evenly between the National Football Conference (NFC) and American Football Conference (AFC). Each conference is currently split into 4 divisions (North, South, East, West), each with 4 teams in it. For more information on the NFL and for different football terms, check out the NFL page. The NFL playoffs have expanded and changed quite a lot over the years, but here's a brief overview.
Starting in 1967, the top team in the AFC (formerly the AFL) and NFC (formerly the NFL) faced off against each other in the Super Bowl. Each league started off with just 2 divisions, and the winners of each division made it to the playoffs, meaning 4 teams made it to the playoffs. The winner of each division playoff game would become the conference champion (either AFL or NFL) and would face the other conference champion in the Super Bowl.
When the NFL and AFL merged in 1970, the league expanded to 3 divisions for each conference (NFC and AFC), so each of the division winners and a wildcard team (i.e. the team with the best record after the division winners) made the playoffs. The wildcard would play the #1 seed, totaling 8 teams in the playoffs. The winner of that would then face the winner of the #2 and #3 seed, and the winner of that would be the conference champion (now either NFC or AFC). Each conference champion then faced each other in the Super Bowl.
In 1978, a second wildcard was added to each conference, where the wildcard teams would face each other in the wildcard round. The winner would then face the highest seeded division winner. The rest of the pairings are the same as the 1970 playoff rules. This made it so that 10 teams made the playoffs.
In 1990, a third wildcard was added to each conference. The division champions were labeled from 1 to 3 in terms of their standings, and the wildcard teams were labeled from 4 to 6 by their standings. In the wildcard round, the #3 and #6 seeds and the #4 and #5 seeds would face one another while the #1 and #2 seeds received a first-round bye. The lowest seed that won would then face the #1 seed, and the higher seeded team would face the #2 seed. The winner of each of those match-ups would then face each other to determine the conference champion. Then, each conference champion would face off against each other in the Super Bowl. The higher seed would be guaranteed a home playoff game. This made it so that 12 teams made the playoffs.
In 2002, the league expanded to 4 divisions per conference, so there were now 4 division champions and 2 wildcard teams per conference. However, the same system was more or less kept in place until 2020.
In 2020, a third wildcard team was added to each conference. The #1 seed is the only one given a first-round bye, meaning that the #2 and #7 seeds, the #3 and #6 seeds, and the #4 and #5 seeds all face each other in the wildcard round. The division winners would be guaranteed homefield advantage for their first playoff game, but the highest seed between two pairings always gets homefield advantage. After the wildcard round, 3 teams are eliminated, leaving only 4. The #1 seed, after resting during the first-round bye, would then pair up against the lowest remaining seed, and the 2 teams remaining then face off against one another. The winners of each of those games then face off against each other, and the winner is the conference champion. As you know by now, each conference champion then faces off against the other in the Super Bowl. Currently, 14 teams make the playoffs each year.
As seen above, the playoffs have changed quite a lot over the years. Due to this massive shift, I will be limiting the dataset from 2002 onwards. This is the modern incarnation of the NFL playoffs as we know them today, and is also the most relevant in trying to predict the trajectory of modern-day teams. In addition, there were so many changes to several teams between 1990 and 2002 that it didn't make sense to lump them together in the same groups. From 2002 onwards, all 32 of the teams in the league fully exist, and the only thing that changed were some relocations; however, the team divisions and conferences that each team belongs to has remained constant, which saves me some headache when trying to classify all these teams.
For this project, I want to try and analyze the various playoff and championship teams to predict which teams have the best shot at the championship for each year. We will try to predict which teams will make the playoffs, how many rounds they're predicted to win, and attempt to see which team will be hailed as that year's Super Bowl champion. We'll use data starting from the 2002-03 season up until the 2021-22 season, and we'll only use the regular season data of the 2022-23 season. We'll then see how close this model gets to predicting the playoff wins and Super Bowl winner of the 2022-23 season.
Here's all the libraries we'll use for this project.
import numpy as np
import pandas as pd
import requests as requests
from bs4 import BeautifulSoup
import re
import plotly.io as pio
import plotly.express as px
import plotly.graph_objs as go
import statsmodels.api as sm
import statsmodels.formula.api as smf
import plotly.graph_objs as go
import statsmodels as sm
import time
import os.path
import string
import math
from plotly.subplots import make_subplots
import plotly.graph_objects as go
pio.renderers.default = "notebook+plotly_mimetype+png+jpeg+svg+pdf"
To start off, we need data for the offense and defense of each team from every season between 1990-91 and 2022-23. All of this data can be found on NFL.com. Offense is broken up into 5 different tables (Passing, Rushing, Receiving, Scoring, Downs), while defense is broken up into 6 tables (Passing, Rushing, Scoring, Downs, Fumbles, Interceptions). We'll ignore special teams data since it isn't as relevant as offense and defense. This means that for each year, there will be 11 different tables of data. With 21 seasons of data at our disposal, that totals to 231 different tables of data that we'll have to keep track of initially.
We'll use BeautifulSoup to help us obtain the data. We'll do this by looping through each year, each stat, and alternating between offense and defense. Some defensive stats don't apply to offense, so we account for that so as to not cause an error when trying to request the web page. I also added a Thread.sleep call to avoid being rate-limited.
For now, we'll store the data in CSV files the first time around (the file will exist afterwards, so there's no need to run it again). It took an hour the first time I ran this, so we certainly don't want to have to wait that long each time.
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36',
}
off_def = ["offense", "defense"]
o_stats = ["passing", "rushing", "receiving", "scoring", "downs"]
d_stats = ["passing", "rushing", "scoring", "downs", "fumbles", "interceptions"]
stats = d_stats + ["receiving"]
years = list(range(2002, 2023)) # 2002 to 2022
for year in years:
for stat in stats:
for type in off_def:
# Check if the type + stat combo exists before proceeding
if type == "offense" and stat in o_stats or type == "defense" and stat in d_stats:
csv_name = f'OFF+DEF Data/{year}_{type}_{stat}.csv'
# Only run if file doesn't already exist
if not os.path.exists(csv_name):
r = requests.get(f'https://www.nfl.com/stats/team-stats/{type}/{stat}/{year}/reg/all', headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
soup = soup.find('table')
data_df = pd.read_html(str(soup))[0]
data_df.to_csv(csv_name, index=False)
time.sleep(3.0)
One thing that the NFL data was missing was the regular season record of each team. This is obviously an important metric that we need for our model, so I had to scrape that data from Pro Football Reference. Once again, we'll loop through each year and each conference using BeautifulSoup to obtain our data. We're also storing this information in CSV files the first time around (again, the file will exist afterwards, so there's no need to run it again).
conferences = ["AFC", "NFC"]
for year in years:
for conference in conferences:
csv_name = f'Standings Data/{year}_standings_{conference}.csv'
# Only run if file doesn't already exist
if not os.path.exists(csv_name):
r = requests.get(f'https://www.pro-football-reference.com/years/{year}/', headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
index = 0 if conference == 'AFC' else 1
soup = soup.find_all('table')[index]
data_df = pd.read_html(str(soup))[0]
data_df.to_csv(csv_name, index=False)
time.sleep(3.0)
We also needed the past Super Bowl winners for our model, so we found that information from Topend Sports. We just need to do one call since it's only one table that holds all the data.
csv_name = f'Super Bowl Winners/winners.csv'
if not os.path.exists(csv_name):
r = requests.get('https://www.topendsports.com/events/super-bowl/winners-list.htm', headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
soup = soup.find('table')
data_df = pd.read_html(str(soup))[0]
data_df.to_csv(csv_name, index=False)
Now that we have all the data, it's time to merge them together into one big dataframe. The process is the following:
stats_df, the standings stored in standings_df) and combining them together to form a dataframe for the year (year_df). Before we do that, however, we're going to have to rename some of the column names for the stats since there's too much overlap between themyear_df, to the overall dataframe, nfl_dfsb_df, with nfl_df to form our complete dataframeIt may not seem like much, but there are a lot CSV files. For stats alone, there's 11 offensive and defensive stats per year, totaling to 11 21 = 231 files. For standings, there's 2 separate standings for each conference per year, totaling to 2 21 = 42 files. This, of course, is neglecting all the changes we have to make in the columns and the dataframes, of which there are a lot.
Let's start with the easy part: defining some of the variables we need:
nfl_df is our overall dataframeo_d_dict is a dictionary that shortens "offense" and "defense"stat_dict is a dictionary that shortens the various statsteam_dict is a dictionary that maps the names of teams from the stats dataset to the standings dataseto_d_dict and stat_dict will be used for changing the column names for stats_df since otherwise we'd have too much overlap between names.
# Overall df
nfl_df = pd.DataFrame()
# Dictionaries helping with renaming stats
o_d_dict = {'offense':'off', 'defense':'def'}
stat_dict = {'passing':'Pass ',
'rushing':'Rush ',
'receiving':'Rec ',
'scoring':'Scor ',
'downs':'Down ',
'fumbles':'Fumb ',
'interceptions':'Int '
}
# Dictionary helping with mapping name of team from one dataset to the otherr
team_dict = {'CardinalsCardinals':'Arizona Cardinals',
'FalconsFalcons':'Atlanta Falcons',
'RavensRavens':'Baltimore Ravens',
'BillsBills':'Buffalo Bills',
'BearsBears':'Chicago Bears',
'BengalsBengals':'Cincinnati Bengals',
'BrownsBrowns':'Cleveland Browns',
'CowboysCowboys':'Dallas Cowboys',
'BroncosBroncos':'Denver Broncos',
'LionsLions':'Detroit Lions',
'PackersPackers':'Green Bay Packers',
'ColtsColts':'Indianapolis Colts',
'ChiefsChiefs':'Kansas City Chiefs',
'RaidersRaidersLV':'Las Vegas Raiders',
'ChargersChargersLA':'Los Angeles Chargers',
'RamsRamsLA':'Los Angeles Rams',
'RaidersRaiders':'Oakland Raiders',
'DolphinsDolphins':'Miami Dolphins',
'VikingsVikings':'Minnesota Vikings',
'PatriotsPatriots':'New England Patriots',
'SaintsSaints':'New Orleans Saints',
'GiantsGiants':'New York Giants',
'JetsJets':'New York Jets',
'EaglesEagles':'Philadelphia Eagles',
'SteelersSteelers':'Pittsburgh Steelers',
'ChargersChargers':'San Diego Chargers',
'49ers49ers':'San Francisco 49ers',
'NinersNiners':'San Francisco 49ers',
'SeahawksSeahawks':'Seattle Seahawks',
'RamsRams':'St. Louis Rams',
'BuccaneersBuccaneers':'Tampa Bay Buccaneers',
'Football TeamFootball Team':'Washington Football Team',
'CommandersCommanders':'Washington Commanders',
'RedskinsRedskins':'Washington Redskins',
'TexansTexans':'Houston Texans',
'PanthersPanthers':'Carolina Panthers',
'TitansTitans':'Tennessee Titans',
'JaguarsJaguars':'Jacksonville Jaguars'
}
Next, let's define a helper function for when we combine our various stats into stats_df. We'll pass in stat_df, the year, the type of "fense" we're looking at (offense vs defense), and the stat. Here's what our function will do:
type and stat match up. If they do, that's when we make a call to its corresponding CSV file for the yearo_d_dict and stat_dict to create a prefix we can add to all columns (except for Team since that's the column we'll be merging on). This will help us differentiate between the different stats, since many of them have the same name as other statsteam_dictstats_df, and this will hold all the stats for a certain yeardef make_stats(stat_df, year, type, stat):
# Check if the type + stat combo exists before proceeding
if type == "offense" and stat in o_stats or type == "defense" and stat in d_stats:
df = pd.read_csv(f'OFF+DEF Data/{year}_{type}_{stat}.csv')
# Rename column stats
prefix = o_d_dict[type] + stat_dict[stat]
df = df.rename(columns = lambda col: f"{prefix}{col}"
if col not in ('Team')
else col
)
# Update team names to make it easier to merge with standings data
# St Louis Rams relocated to Los Angeles in 2016, update to map to correct dict vals
if year >= 2016:
for idx, val in df['Team'].iteritems():
# San Diego Chargers relocated to Los Angeles in 2017
if val == 'RamsRams' or (year >= 2017 and val == 'ChargersChargers'):
df['Team'][idx] += 'LA'
# Oakland Raiders relocated to Las Vegas in 2020
elif year >= 2020 and val == 'RaidersRaiders':
df['Team'][idx] += 'LV'
# Merge all the dataframes for a year into year_df
if stat_df is None or stat_df.empty:
stat_df = pd.concat([stat_df, df], ignore_index=True)
else:
stat_df = stat_df.merge(df, on="Team", how='outer')
return stat_df
Now, let's move on to the standings. We'll also create a helper function for this that takes in standing_df, as well as the year and conference we're looking at (either AFC or NFC). This function will:
year and the record of the conference at largeconference a team is in (will be helpful in overall dataframe nfl_df)Tm column so that we can merge the stat_df with itstanding_df with all this data so that it holds both AFC and NFC standings for a particular yeardef make_standings(standings_df, year, conference):
# Read in file
df = pd.read_csv(f'Standings Data/{year}_standings_{conference}.csv')
# Specify conference
df['Conference'] = conference
# Determine which teams made playoffs based off if they have '+' or '*' after their name
df.rename(columns={'Tm':'Team'}, inplace=True)
df['Made Playoffs'] = False
for idx, val in (df['Team']).iteritems():
if val.find('+') >= 0 or val.find('*') >= 0:
df['Made Playoffs'][idx] = True
# Remove '+' and '*' from `Team`
df['Team'] = df['Team'].str.replace('+', '', regex=False)
df['Team'] = df['Team'].str.replace('*', '', regex=False)
# Merge standings_df
if standings_df is None or standings_df.empty:
standings_df = df
else:
standings_df = pd.concat([standings_df, df])
return standings_df
At this point (in our main code, outside these helpers), we have all the stats and standings data stored in nfl_df. However, we also want to know who won the Super Bowl for a given year (as well as the runner-up), so we have to create sb_df (which holds the Super Bowl data). We'll create a helper for this that will take in sb_df and:
Year by 1. The regular season occurs in year x, but the Super Bowl takes place in year x+1. However, the winner of the Super Bowl in year x+1 wins the Super Bowl for the season starting in x. For instance, the New York Giants won the Super Bowl in 2008 for the 2007 season, so we would say the Super Bowl winner for the 2007 season was the New York Giants. We must account for this if we want to join the values togetherYear, Winner, and Opposition since those are the only relevant ones. We will also rename Winner to SB Winner and Opposition to SB Runner-UpOnce this is all done, we can merge nfl_df with sb_df.
def make_sb(sb_df):
# Super Bowl for 2022 season is held in 2023, need to subtract to align both of these
sb_df['Year'] -= 1
sb_df = sb_df.drop(columns=['No.', 'Score', 'Venue'])
sb_df = sb_df[:21]
sb_df.rename(columns={'Opposition':'Runner-Up'}, inplace=True)
sb_df = sb_df.rename(columns = lambda col: f'SB {col}'
if col not in ('Year')
else col
)
return sb_df
Now that we have all of our helpers, we can actually merge the data together to create nfl_df. We will be:
year, stats, and type and calling make_stats to construct stat_dfTeam names in stat_df using team_dict to make merging with standings data easieryear_df with stat_df after the function callyear and conference and calling make_standings to construct standings_dfyear_df with standings_df after the function callyear_df to nfl_df since we're done with this year and will be moving on to the next yearmake_sb to construct sb_df outside of the loopsnfl_df and sb_dfAnd just like that, we've created the dataframe we'll be working with for the rest of this project.
for year in years:
year_df = pd.DataFrame()
# Combine all offensive + defensive stats together
stat_df = pd.DataFrame()
for stat in stats:
for type in off_def:
stat_df = make_stats(stat_df, year, type, stat)
# Rename team names to make it easier to merge with standings data
stat_df['Team'].replace(team_dict, inplace=True)
year_df = stat_df.copy(deep=True)
# Combine standings data together
standings_df = pd.DataFrame()
for conference in conferences:
standings_df = make_standings(standings_df, year, conference)
# Merge year and standings df w/ one another
year_df = standings_df.merge(year_df, on='Team', how='outer')
year_df['Year'] = year
# Add year_df onto bottom of nfl_df
nfl_df = pd.concat([nfl_df, year_df], ignore_index=True)
# Create Super Bowl data and merge with nfl_df
sb_df = pd.read_csv(f'Super Bowl Winners/winners.csv')
sb_df = make_sb(sb_df)
nfl_df = nfl_df.merge(sb_df, on='Year', how='outer')
nfl_df
C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:19: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2351266730.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy C:\Users\aabag\AppData\Local\Temp\ipykernel_23484\2390441881.py:14: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
| Team | W | L | T | W-L% | PF | PA | PD | MoV | SoS | ... | offRec TD | offRec 20+ | offRec 40+ | offRec Lng | offRec Rec 1st | offRec Rec 1st% | offRec Rec FUM | Year | SB Winner | SB Runner-Up | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | New York Jets | 9 | 7 | 0.0 | 0.563 | 359 | 336 | 23 | 1.4 | 1.7 | ... | 25 | 44 | 3 | 47T | 190 | 57.8 | 4 | 2002 | Tampa Bay Buccaneers | Oakland Raiders |
| 1 | New England Patriots | 9 | 7 | 0.0 | 0.563 | 381 | 346 | 35 | 2.2 | 1.8 | ... | 28 | 37 | 3 | 49 | 184 | 49.2 | 5 | 2002 | Tampa Bay Buccaneers | Oakland Raiders |
| 2 | Miami Dolphins | 9 | 7 | 0.0 | 0.563 | 378 | 301 | 77 | 4.8 | 1.2 | ... | 18 | 38 | 5 | 77T | 155 | 57.2 | 6 | 2002 | Tampa Bay Buccaneers | Oakland Raiders |
| 3 | Buffalo Bills | 8 | 8 | 0.0 | 0.500 | 379 | 397 | -18 | -1.1 | 0.9 | ... | 24 | 45 | 13 | 73 | 218 | 57.8 | 4 | 2002 | Tampa Bay Buccaneers | Oakland Raiders |
| 4 | Pittsburgh Steelers | 10 | 5 | 1.0 | 0.656 | 390 | 345 | 45 | 2.8 | -0.1 | ... | 26 | 51 | 8 | 72 | 199 | 56.9 | 6 | 2002 | Tampa Bay Buccaneers | Oakland Raiders |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 667 | Atlanta Falcons | 7 | 10 | 0.0 | 0.412 | 365 | 386 | -21 | -1.2 | -0.9 | ... | 17 | 37 | 5 | 75T | 148 | 57.6 | 4 | 2022 | Kansas City Chiefs | Philadelphia Eagles |
| 668 | San Francisco 49ers | 13 | 4 | 0.0 | 0.765 | 450 | 277 | 173 | 10.2 | -2.3 | ... | 30 | 56 | 6 | 57 | 188 | 55.6 | 4 | 2022 | Kansas City Chiefs | Philadelphia Eagles |
| 669 | Seattle Seahawks | 9 | 8 | 0.0 | 0.529 | 407 | 401 | 6 | 0.4 | -0.8 | ... | 30 | 50 | 6 | 54 | 206 | 51.6 | 4 | 2022 | Kansas City Chiefs | Philadelphia Eagles |
| 670 | Los Angeles Rams | 5 | 12 | 0.0 | 0.294 | 307 | 384 | -77 | -4.5 | 0.5 | ... | 16 | 37 | 4 | 75 | 180 | 52.0 | 2 | 2022 | Kansas City Chiefs | Philadelphia Eagles |
| 671 | Arizona Cardinals | 4 | 13 | 0.0 | 0.235 | 340 | 449 | -109 | -6.4 | 0.2 | ... | 17 | 40 | 3 | 77 | 189 | 43.6 | 4 | 2022 | Kansas City Chiefs | Philadelphia Eagles |
672 rows × 99 columns
Now that all our data is in order, it's time to add two more columns. We can use the SB Winner and SB Runner-Up columns to determine if a particular team from that year won the Super Bowl (Won SB) and/or made it (Made SB) and assign those values as True and False. This will make our lives easier down the line. We also don't need the SB Winner and SB Runner-Up columns anymore, so we can just drop them.
I'll show the first few values to prove that it worked.
nfl_df['Won SB'] = nfl_df.apply(lambda row: row['Team'] == row['SB Winner'], axis=1)
nfl_df['Lost SB'] = nfl_df.apply(lambda row: row['Team'] == row['SB Runner-Up'], axis=1)
nfl_df['Made SB'] = nfl_df.apply(lambda row: row['Team'] == row['SB Runner-Up'] or row['Won SB'], axis=1)
nfl_df = nfl_df.drop(columns={'SB Winner', 'SB Runner-Up'})
nfl_df[nfl_df['Made SB']].head(6)
| Team | W | L | T | W-L% | PF | PA | PD | MoV | SoS | ... | offRec 20+ | offRec 40+ | offRec Lng | offRec Rec 1st | offRec Rec 1st% | offRec Rec FUM | Year | Won SB | Lost SB | Made SB | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 12 | Oakland Raiders | 11 | 5 | 0.0 | 0.688 | 450 | 304 | 146 | 9.1 | 1.5 | ... | 48 | 8 | 75T | 226 | 54.1 | 4 | 2002 | False | True | True |
| 24 | Tampa Bay Buccaneers | 12 | 4 | 0.0 | 0.750 | 346 | 196 | 150 | 9.4 | -0.6 | ... | 37 | 6 | 76 | 172 | 49.4 | 5 | 2002 | True | False | True |
| 32 | New England Patriots | 14 | 2 | NaN | 0.875 | 348 | 238 | 110 | 6.9 | 0.1 | ... | 44 | 8 | 82 | 177 | 55.3 | 3 | 2003 | True | False | True |
| 56 | Carolina Panthers | 11 | 5 | NaN | 0.688 | 325 | 304 | 21 | 1.3 | -2.2 | ... | 46 | 8 | 67 | 146 | 54.1 | 7 | 2003 | False | True | True |
| 64 | New England Patriots | 14 | 2 | NaN | 0.875 | 437 | 260 | 177 | 11.1 | 1.8 | ... | 53 | 10 | 50 | 193 | 65.9 | 4 | 2004 | True | False | True |
| 80 | Philadelphia Eagles | 13 | 3 | NaN | 0.813 | 386 | 260 | 126 | 7.9 | -2.3 | ... | 56 | 20 | 80T | 188 | 56.0 | 4 | 2004 | False | True | True |
6 rows × 100 columns
The Year Team combo gives us a better idea which team we're looking at. It's the difference between saying "The New York Giants", which would refer to the franchise itself as a whole, and "The 2007 New York Giants", which is the team that beat the undefeated 2007 New England Patriots in the Super Bowl (suck it Tom Brady).
nfl_df['Year'] = nfl_df['Year'].astype(pd.StringDtype())
nfl_df['Team'] = nfl_df['Year'] + ' ' + nfl_df['Team']
nfl_df = nfl_df.drop(columns={'Year'})
nfl_df
| Team | W | L | T | W-L% | PF | PA | PD | MoV | SoS | ... | offRec TD | offRec 20+ | offRec 40+ | offRec Lng | offRec Rec 1st | offRec Rec 1st% | offRec Rec FUM | Won SB | Lost SB | Made SB | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2002 New York Jets | 9 | 7 | 0.0 | 0.563 | 359 | 336 | 23 | 1.4 | 1.7 | ... | 25 | 44 | 3 | 47T | 190 | 57.8 | 4 | False | False | False |
| 1 | 2002 New England Patriots | 9 | 7 | 0.0 | 0.563 | 381 | 346 | 35 | 2.2 | 1.8 | ... | 28 | 37 | 3 | 49 | 184 | 49.2 | 5 | False | False | False |
| 2 | 2002 Miami Dolphins | 9 | 7 | 0.0 | 0.563 | 378 | 301 | 77 | 4.8 | 1.2 | ... | 18 | 38 | 5 | 77T | 155 | 57.2 | 6 | False | False | False |
| 3 | 2002 Buffalo Bills | 8 | 8 | 0.0 | 0.500 | 379 | 397 | -18 | -1.1 | 0.9 | ... | 24 | 45 | 13 | 73 | 218 | 57.8 | 4 | False | False | False |
| 4 | 2002 Pittsburgh Steelers | 10 | 5 | 1.0 | 0.656 | 390 | 345 | 45 | 2.8 | -0.1 | ... | 26 | 51 | 8 | 72 | 199 | 56.9 | 6 | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 667 | 2022 Atlanta Falcons | 7 | 10 | 0.0 | 0.412 | 365 | 386 | -21 | -1.2 | -0.9 | ... | 17 | 37 | 5 | 75T | 148 | 57.6 | 4 | False | False | False |
| 668 | 2022 San Francisco 49ers | 13 | 4 | 0.0 | 0.765 | 450 | 277 | 173 | 10.2 | -2.3 | ... | 30 | 56 | 6 | 57 | 188 | 55.6 | 4 | False | False | False |
| 669 | 2022 Seattle Seahawks | 9 | 8 | 0.0 | 0.529 | 407 | 401 | 6 | 0.4 | -0.8 | ... | 30 | 50 | 6 | 54 | 206 | 51.6 | 4 | False | False | False |
| 670 | 2022 Los Angeles Rams | 5 | 12 | 0.0 | 0.294 | 307 | 384 | -77 | -4.5 | 0.5 | ... | 16 | 37 | 4 | 75 | 180 | 52.0 | 2 | False | False | False |
| 671 | 2022 Arizona Cardinals | 4 | 13 | 0.0 | 0.235 | 340 | 449 | -109 | -6.4 | 0.2 | ... | 17 | 40 | 3 | 77 | 189 | 43.6 | 4 | False | False | False |
672 rows × 99 columns
Earlier, when I said that our data was an order, I neglected to mention that there are some issues. They just weren't that relevant until now. You may notice in the table above that T (the number of tied games for a team in a season) are inconsistent - sometimes it's listed as 0, other times as N/A.
Intuitively, this makes sense: there are many different ways to score in the NFL (touchdowns are 6 points, field goals are 3 points, safeties and 2-point conversions are 2 points, point after touchdowns are 1 point), so it's not often that Ties will be updated. Let's double check to see if there are any other N/A values in nfl_df.
nfl_df.isna().sum()
Team 0
W 0
L 0
T 320
W-L% 0
...
offRec Rec 1st% 0
offRec Rec FUM 0
Won SB 0
Lost SB 0
Made SB 0
Length: 99, dtype: int64
nfl_df.isna().sum().sum()
320
Seems like it's only Ties with this issue. We'll simplify this issue and turn all N/As into 0s.
nfl_df['T'] = nfl_df['T'].fillna(0)
nfl_df.isna().sum()
Team 0
W 0
L 0
T 0
W-L% 0
..
offRec Rec 1st% 0
offRec Rec FUM 0
Won SB 0
Lost SB 0
Made SB 0
Length: 99, dtype: int64
Now, Ties should be updated to be a floating point value.
nfl_df.dtypes
Team string
W int64
L int64
T float64
W-L% float64
...
offRec Rec 1st% float64
offRec Rec FUM int64
Won SB bool
Lost SB bool
Made SB bool
Length: 99, dtype: object
You may notice that nearly all the stats we've been looking at are numeric values. However, some columns are not. They're objects, as we can see below:
nfl_df.select_dtypes(include='object')
| Conference | offPass Lng | offRush Lng | defInt Lng | offRec Lng | |
|---|---|---|---|---|---|
| 0 | AFC | 47T | 61 | 65 | 47T |
| 1 | AFC | 49 | 45 | 90 | 49 |
| 2 | AFC | 77T | 63T | 62T | 77T |
| 3 | AFC | 73T | 34 | 42 | 73 |
| 4 | AFC | 72 | 42 | 84T | 72 |
| ... | ... | ... | ... | ... | ... |
| 667 | NFC | 75T | 44 | 28T | 75T |
| 668 | NFC | 57 | 71 | 56 | 57 |
| 669 | NFC | 54 | 74 | 40T | 54 |
| 670 | NFC | 75T | 42 | 85T | 75 |
| 671 | NFC | 77 | 45 | 56 | 77 |
672 rows × 5 columns
For now, let's focus on the columns ending in - Lng. Let's take a look at one of these columns:
nfl_df['offRush Lng'].head(10)
0 61 1 45 2 63T 3 34 4 42 5 64 6 75T 7 67 8 39T 9 49 Name: offRush Lng, dtype: object
It appears that some values have a 'T' after them. It appears that the reason for this is because the T stands for tied. This means that 2+ players on the team achieved this stat during the season. In the context of offRush Lng, it means that 2+ "offensive" players "rushed" for the same amount of yards starting from the line of scrimmage.
However, we don't particularly care if more than one player achieved this statistic: we just care about the largest number for the season (in this case, the longest offensive rushing play for a team during a certain season). Therefore, we'll simply remove the 'T' from all - Lng stats.
# Cast as string to use string operations in loop below
nfl_df['Team'] = nfl_df['Team'].astype(pd.StringDtype())
# Find all occurences of 'Lng'
lng_re = re.compile('Lng')
for (colName, colData) in nfl_df.iteritems():
if bool(re.findall(lng_re, colName)):
nfl_df[colName] = nfl_df[colName].astype(pd.StringDtype())
for (idx, val) in nfl_df[colName].iteritems():
# Replace 'T' with ''
nfl_df[colName][idx] = nfl_df[colName][idx].replace('T', '')
nfl_df[colName] = nfl_df[colName].astype(int)
nfl_df.dtypes
Team string
W int64
L int64
T float64
W-L% float64
...
offRec Rec 1st% float64
offRec Rec FUM int64
Won SB bool
Lost SB bool
Made SB bool
Length: 99, dtype: object
Let's now remedy the other object types, those being Team and Conference. Let's change their types of so that we can more easily use their data.
nfl_df['Conference'] = nfl_df['Conference'].astype(pd.StringDtype())
nfl_df['Team'] = nfl_df['Team'].astype(pd.StringDtype())
nfl_df.dtypes
Team string
W int64
L int64
T float64
W-L% float64
...
offRec Rec 1st% float64
offRec Rec FUM int64
Won SB bool
Lost SB bool
Made SB bool
Length: 99, dtype: object
Our dataframe is now all tidied up at this point.
As you can imagine, not all stats will be useful in determining if a team is worthy to make the Super Bowl in a given year. It's time to finally discuss some of the stats and determine which ones we can filter out before we determine linear relationships and filter more out.
Here's a list of all the data from the standings data and an explanation for them:
W - games wonL - games lostT - games tiedW-L% - win-loss percentage. Calculated by taking the number of wins and dividing by the total number of games; any ties are considered half a winPF - points scored by a team's offensePA - points scored against a team/points allowed by a team's defensePD - points differential. Found by taking the difference between PF and PA; positive differential means that a team put up more points than their opponents did all season long, and vice versaMoV - margin of victory. Found by dividing PD by # of gamesSoS - strength of schedule. Measures the strength of all the team's opponents; strength of team's opponents measured using SRSSRS - simple rating system. Rating that takes into account average point differential and strength of schedule and measures how good a team is (0.0 is average); can be calculated by taking sum of MoV and SoS, or OSRS and DSRS.OSRS - offensive SRS. Rating that measures the quality of a team's offense relative to the average (0.0)DSRS - defensive SRS. Rating that measures the quality of a team's defense relative to the average (0.0)From the standings, we can see that:
W-L% is a good summary of W, L, and T, so we can drop these three valuesPD is a good summary of PF and PA, so we can drop these two valuesnfl_df = nfl_df.drop(columns={'W', 'L', 'T', 'PF', 'PA'})
nfl_df
| Team | W-L% | PD | MoV | SoS | SRS | OSRS | DSRS | Conference | Made Playoffs | ... | offRec TD | offRec 20+ | offRec 40+ | offRec Lng | offRec Rec 1st | offRec Rec 1st% | offRec Rec FUM | Won SB | Lost SB | Made SB | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2002 New York Jets | 0.563 | 23 | 1.4 | 1.7 | 3.2 | 0.9 | 2.3 | AFC | True | ... | 25 | 44 | 3 | 47 | 190 | 57.8 | 4 | False | False | False |
| 1 | 2002 New England Patriots | 0.563 | 35 | 2.2 | 1.8 | 4.0 | 2.1 | 1.9 | AFC | False | ... | 28 | 37 | 3 | 49 | 184 | 49.2 | 5 | False | False | False |
| 2 | 2002 Miami Dolphins | 0.563 | 77 | 4.8 | 1.2 | 6.1 | 1.7 | 4.4 | AFC | False | ... | 18 | 38 | 5 | 77 | 155 | 57.2 | 6 | False | False | False |
| 3 | 2002 Buffalo Bills | 0.500 | -18 | -1.1 | 0.9 | -0.3 | 2.1 | -2.3 | AFC | False | ... | 24 | 45 | 13 | 73 | 218 | 57.8 | 4 | False | False | False |
| 4 | 2002 Pittsburgh Steelers | 0.656 | 45 | 2.8 | -0.1 | 2.7 | 3.1 | -0.4 | AFC | True | ... | 26 | 51 | 8 | 72 | 199 | 56.9 | 6 | False | False | False |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 667 | 2022 Atlanta Falcons | 0.412 | -21 | -1.2 | -0.9 | -2.1 | -0.1 | -2.0 | NFC | False | ... | 17 | 37 | 5 | 75 | 148 | 57.6 | 4 | False | False | False |
| 668 | 2022 San Francisco 49ers | 0.765 | 173 | 10.2 | -2.3 | 7.9 | 3.3 | 4.6 | NFC | True | ... | 30 | 56 | 6 | 57 | 188 | 55.6 | 4 | False | False | False |
| 669 | 2022 Seattle Seahawks | 0.529 | 6 | 0.4 | -0.8 | -0.5 | 1.9 | -2.4 | NFC | True | ... | 30 | 50 | 6 | 54 | 206 | 51.6 | 4 | False | False | False |
| 670 | 2022 Los Angeles Rams | 0.294 | -77 | -4.5 | 0.5 | -4.0 | -4.1 | 0.0 | NFC | False | ... | 16 | 37 | 4 | 75 | 180 | 52.0 | 2 | False | False | False |
| 671 | 2022 Arizona Cardinals | 0.235 | -109 | -6.4 | 0.2 | -6.2 | -1.9 | -4.3 | NFC | False | ... | 17 | 40 | 3 | 77 | 189 | 43.6 | 4 | False | False | False |
672 rows × 94 columns
Next, we move onto offensive stats. A list of the stats can be found here and here. The prefix for the name of the column is given in parentheses, and the relevant stat is in the bullet point below. For example, offPass Att is represented below under "Passing (offPass)" and Att.
Note that most offensive stats are gained by the offense. For instance, offPass Att can be read as "passing attempts gained by the offense". Defensive stats like interceptions and sacks would be the exception, so something like offPass INT can be read as "interceptions gained by opposition's defense".
offPass)Att - passing attempts. Number of attempts that player threw the ball forward, attempting to complete a passCmp - completions/completed passesCmp% - completion percentage. Found by dividing Cmp by AttYds/Att - yards gained per passing attempt. Found by dividing Pass Yds by AttPass Yds - passing yards. Total yards gained passing the ballTD - passing touchdowns scoredINT - interceptions. Player from other team picks off offensive player who threw the ballRate - passer/QB rating. Metric of how well a quarterback (QB) has been playing; measured by Cmp%, Yds/Att, TD % (measured by dividing TD by Att), and INT % (measured by dividing INT by Att)1st - first downs achieved from passing the ball1st% - first down percentage from passing the ball. Found by dividing 1st by (Att + Sck)20+ - passing completions >=20 yards40+ - passing completions >=40 yardsLng - longest passing completion in yardsSck - sacks. Number of times QB is sacked by defenseSckY - yards lost on sacks. Total number of yards lost from the line of scrimmage by QB who was sackedoffRush)Att - rushing attempts. Number attempts that player tried rushing with ball in hand (AKA carrying)Rush Yds - rushing yards. Total yards gained rushing the ballYPC - yards gained per carry. Found by dividing Rush Yds by AttTD - rushing touchdowns scored20+ - rushing completions >=20 yards40+ - passing completions >=40 yardsLng - longest rushing completion in yardsRush 1st - first downs achieved from rushing the ballRush 1st% - first down percentage from rushing the ball. Found by dividing 1st by (Att + sacks)Rush FUM - rushing fumbles. Number of times football is dropped before a rushing play is blown deadoffRec)Rec - receptions. Number of times a player catches a forward passYds - receiving yards. Total yards gained when catching the ballYds/Rec - receiving yards gained per reception. Found by dividing Yds by RecTD - receiving touchdowns scored20+ - rushing receptions >=20 yards40+ - rushing receptions >=40 yardsLng - longest reception in yardsRec 1st - first downs achieved from receiving the ballRec 1st% - first down percentage from receiving the ball. Found by dividing Rec 1st by (Rec + sacks)Rec FUM - receiving fumbles. Number of times drops football before a receiving play is blown deadoffScor)Rsh TD - rushing touchdowns scoredRec TD - receiving touchdowns scoredTot TD - total touchdowns scored2-PT - 2-point conversions scoredoffDown)3rd Att - third down attempts3rd Md - third down conversions4th Att - fourth down attempts4th Md - fourth down conversionsRec 1st - first downs achieved by receivingRec 1st% - first down receiving percentage. Found by dividing Rec 1st by receiving attempts and sacksRush 1st - first downs achieved by rushingRush 1st% - first down rushing percentage. Found by dividing Rush 1st by rushing attempts and sacksScrm Plys - play from scimmage. Number of times a play is attempted from the line of scrimmageHere are some takeaways:
Rsh TD, Rec TD, Rec 1st, Rec 1st%, Rush 1st and Rush 1st% are listed more than once. We can drop them from the dataframeoffPass Pass Yds – offPass SckY) by total passing attempts (offPass Att + offPass Sck). According to Bud Goode, the inventor of this stat, the team with the higher value wins about 80% of the time, so it stands to reason the more successful teams will have higher values of this particular stat. Think of this stat as a better version of Yds/Att. We'll add this stat to our overall dataframe as N Yds/Attnfl_df['Net Yd'] = nfl_df['offPass Pass Yds'] - nfl_df['offPass SckY']
nfl_df['Pass Att'] = nfl_df['offPass Att'] + nfl_df['offPass Sck']
nfl_df['N Yds/Att'] = nfl_df['Net Yd'] / nfl_df['Pass Att']
# Truncate `NY/A` to two decimal places
nfl_df['N Yds/Att'] = nfl_df['N Yds/Att'].apply(lambda x: math.trunc(100 * x) / 100)
# Drop repeat columns
nfl_df = nfl_df.drop(
columns={'offScor Rsh TD',
'offScor Rec TD',
'offDown Rec 1st',
'offDown Rec 1st%',
'offDown Rush 1st',
'offDown Rush 1st%'
}
)
# Drop columns used to help calculate `NY/A`
nfl_df = nfl_df.drop(columns={'Net Yd', 'Pass Att'})
nfl_df
| Team | W-L% | PD | MoV | SoS | SRS | OSRS | DSRS | Conference | Made Playoffs | ... | offRec 20+ | offRec 40+ | offRec Lng | offRec Rec 1st | offRec Rec 1st% | offRec Rec FUM | Won SB | Lost SB | Made SB | N Yds/Att | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2002 New York Jets | 0.563 | 23 | 1.4 | 1.7 | 3.2 | 0.9 | 2.3 | AFC | True | ... | 44 | 3 | 47 | 190 | 57.8 | 4 | False | False | False | 6.61 |
| 1 | 2002 New England Patriots | 0.563 | 35 | 2.2 | 1.8 | 4.0 | 2.1 | 1.9 | AFC | False | ... | 37 | 3 | 49 | 184 | 49.2 | 5 | False | False | False | 5.62 |
| 2 | 2002 Miami Dolphins | 0.563 | 77 | 4.8 | 1.2 | 6.1 | 1.7 | 4.4 | AFC | False | ... | 38 | 5 | 77 | 155 | 57.2 | 6 | False | False | False | 6.02 |
| 3 | 2002 Buffalo Bills | 0.500 | -18 | -1.1 | 0.9 | -0.3 | 2.1 | -2.3 | AFC | False | ... | 45 | 13 | 73 | 218 | 57.8 | 4 | False | False | False | 5.99 |
| 4 | 2002 Pittsburgh Steelers | 0.656 | 45 | 2.8 | -0.1 | 2.7 | 3.1 | -0.4 | AFC | True | ... | 51 | 8 | 72 | 199 | 56.9 | 6 | False | False | False | 6.55 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 667 | 2022 Atlanta Falcons | 0.412 | -21 | -1.2 | -0.9 | -2.1 | -0.1 | -2.0 | NFC | False | ... | 37 | 5 | 75 | 148 | 57.6 | 4 | False | False | False | 5.97 |
| 668 | 2022 San Francisco 49ers | 0.765 | 173 | 10.2 | -2.3 | 7.9 | 3.3 | 4.6 | NFC | True | ... | 56 | 6 | 57 | 188 | 55.6 | 4 | False | False | False | 7.10 |
| 669 | 2022 Seattle Seahawks | 0.529 | 6 | 0.4 | -0.8 | -0.5 | 1.9 | -2.4 | NFC | True | ... | 50 | 6 | 54 | 206 | 51.6 | 4 | False | False | False | 6.35 |
| 670 | 2022 Los Angeles Rams | 0.294 | -77 | -4.5 | 0.5 | -4.0 | -4.1 | 0.0 | NFC | False | ... | 37 | 4 | 75 | 180 | 52.0 | 2 | False | False | False | 5.26 |
| 671 | 2022 Arizona Cardinals | 0.235 | -109 | -6.4 | 0.2 | -6.2 | -1.9 | -4.3 | NFC | False | ... | 40 | 3 | 77 | 189 | 43.6 | 4 | False | False | False | 5.10 |
672 rows × 89 columns
Next, we move onto defensive stats. The prefix for the name of the column is given in parentheses, and the relevant stat is in the bullet point below. For example, defPass Att is represented below under "Passing (defPass)" and Att.
Note that most defensive stats are gained by the defense. For instance, defPass Int can be read as "interceptions gained/forced by the defense". Offensive stats like touchdowns and yards per attempt would be the exception, so something like offPass TD can be read as "touchdowns allowed/given up by defense".
defPass)Att - passing attempts. Number of passing attempts that defense allowedCmp - completions/completed passes. Number of completions allowed by defenseCmp% - completion percentage by opposing offenses. Found by dividing Att by CmpYds - passing yards allowed by defenseYds/Att - passing yards per attempt by opposing offenses. Found by dividing Yds by AttTD - passing touchdowns given up by defenseINT - inteceptions forced by defense1st - first downs gained by opposing offenses from passing the ball1st% - first down percentage gained by opposing offenses from passing the ball. Found by dividing 1st by (Att + Sck)Sck - sacks forced by the defensedefRush)Att - rushing attempts. Number of rushing attempts that defense allowedRush Yds - rushing yards allowed by defenseYPC - yards per carry. Average number of yards allowed by defense per carryTD - rushing touchdowns allowed by defenseRush 1st - first downs gained by opposing offenses from rushing the ballRush 1st% - first down percentage gained by opposing offenses from rushing the ball. Found by dividing 1st by (Att + Sck)defScor)FR TD - fumbles recoveries touchdown. Number of fumbles recovered by defense and also scored a touchdown on opposing offenseSFTY - safeties. Number of safeties that the defense forcedINT TD - interception touchdowns. Number of interceptions that the defense forced and also scored a touchdown ondefDown)3rd Att - third down attempts. Number of third downs that opposing offenses attempted3rd Md - third down conversions. Number of third downs that opposing offenses converted on4th Att - fourth down attempts. Number of fourth downs that opposing offenses attempted4th Md - fourth down conversions. Number of fourth downs that opposing offenses converted onRush 1st - first downs rushing. Number of first downs achieved by opposing rushing offensesRush 1st% - first downs rushing percentage. Found by dividing Rec 1st by the sum of rushing attempts and sacksScrm Plys - play from scimmage. Number of times a play is attempted from the line of scrimmagedefFumb)FF - fumbles forced. Number of fumbles forced by defense (but not necessarily gained possesion of)FR - fumble recoveries. Number of fumbles that the defense forced and gained possession ofFR TD - fumble recoveries touchdown. Number of fumbles recovered by defense and also scored a touchdown on opposing offensedefInt)INT - interceptionsINT TD - interception touchdowns. Number of interceptions that the defense forced and also scored a touchdown onINT Yds - interception return yardage. Number of yards compiled by defense from all interceptionsLng - longest interception in yardsHere are some takeaways:
Rush 1st, Rush 1st% INT, and INT TD are listed more than once. We can drop them from the dataframeoffPass Int + fumbles lost) by the total number of takeaways (defPass Int + defFumb FR). You may notice that there isn't a metric defined for "fumbles lost". That's because there wasn't any reliable data sources stretching back to 2002 that has compiled all this data. The closest I could find was from 2003 onwards. However, we can estimate this value by summing total offensive fumbles (offRush Rush FUM + offRec Rec FUM) and multiplying by 0.824, (since that is the average rate that the opposing defense recovers the fumbled ball)[https://www.footballperspective.com/the-definitive-analysis-of-offensive-fumbles/]. We'll add this stat to our overall dataframe as Turn Marg# Estimate fumbles lost by the offense
nfl_df['Fumb Lost'] = 0.824 * (nfl_df['offRush Rush FUM'] + nfl_df['offRec Rec FUM'])
nfl_df['Gives'] = nfl_df['offPass INT'] + nfl_df['Fumb Lost']
nfl_df['Takes'] = nfl_df['defPass INT'] + nfl_df['defFumb FR']
nfl_df['Turn Marg'] = nfl_df['Takes'] - nfl_df['Gives']
# Make margin an integer value
nfl_df['Turn Marg'] = nfl_df['Turn Marg'].apply(lambda x: round(x))
# Drop repeat columns
nfl_df = nfl_df.drop(
columns={'defDown Rush 1st',
'defDown Rush 1st%',
'defInt INT',
'defInt INT TD'
}
)
# Drop columns used to help calculate `Turn Marg`
nfl_df = nfl_df.drop(columns={'Fumb Lost', 'Gives', 'Takes'})
nfl_df
| Team | W-L% | PD | MoV | SoS | SRS | OSRS | DSRS | Conference | Made Playoffs | ... | offRec 40+ | offRec Lng | offRec Rec 1st | offRec Rec 1st% | offRec Rec FUM | Won SB | Lost SB | Made SB | N Yds/Att | Turn Marg | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2002 New York Jets | 0.563 | 23 | 1.4 | 1.7 | 3.2 | 0.9 | 2.3 | AFC | True | ... | 3 | 47 | 190 | 57.8 | 4 | False | False | False | 6.61 | -2 |
| 1 | 2002 New England Patriots | 0.563 | 35 | 2.2 | 1.8 | 4.0 | 2.1 | 1.9 | AFC | False | ... | 3 | 49 | 184 | 49.2 | 5 | False | False | False | 5.62 | -7 |
| 2 | 2002 Miami Dolphins | 0.563 | 77 | 4.8 | 1.2 | 6.1 | 1.7 | 4.4 | AFC | False | ... | 5 | 77 | 155 | 57.2 | 6 | False | False | False | 6.02 | -10 |
| 3 | 2002 Buffalo Bills | 0.500 | -18 | -1.1 | 0.9 | -0.3 | 2.1 | -2.3 | AFC | False | ... | 13 | 73 | 218 | 57.8 | 4 | False | False | False | 5.99 | -20 |
| 4 | 2002 Pittsburgh Steelers | 0.656 | 45 | 2.8 | -0.1 | 2.7 | 3.1 | -0.4 | AFC | True | ... | 8 | 72 | 199 | 56.9 | 6 | False | False | False | 6.55 | -15 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 667 | 2022 Atlanta Falcons | 0.412 | -21 | -1.2 | -0.9 | -2.1 | -0.1 | -2.0 | NFC | False | ... | 5 | 75 | 148 | 57.6 | 4 | False | False | False | 5.97 | -3 |
| 668 | 2022 San Francisco 49ers | 0.765 | 173 | 10.2 | -2.3 | 7.9 | 3.3 | 4.6 | NFC | True | ... | 6 | 57 | 188 | 55.6 | 4 | False | False | False | 7.10 | 12 |
| 669 | 2022 Seattle Seahawks | 0.529 | 6 | 0.4 | -0.8 | -0.5 | 1.9 | -2.4 | NFC | True | ... | 6 | 54 | 206 | 51.6 | 4 | False | False | False | 6.35 | 6 |
| 670 | 2022 Los Angeles Rams | 0.294 | -77 | -4.5 | 0.5 | -4.0 | -4.1 | 0.0 | NFC | False | ... | 4 | 75 | 180 | 52.0 | 2 | False | False | False | 5.26 | 1 |
| 671 | 2022 Arizona Cardinals | 0.235 | -109 | -6.4 | 0.2 | -6.2 | -1.9 | -4.3 | NFC | False | ... | 3 | 77 | 189 | 43.6 | 4 | False | False | False | 5.10 | -9 |
672 rows × 86 columns
Now that we've talked about all the stats, let's graph them against W-L% to see which stats are a good indicator to predict regular season wins. Remember, we're trying to use regular season wins to predict a Super Bowl winner, so it's relevant to us to see which stats are good at predicting regular season wins.
plot_stats¶To start off, let's create a dictionary to map each stat to its full description. This will be helpful when we make a helper function to graph all the stats against each other.
standings_dict = {'PD': 'Point Differential',
'MoV': 'Margin of Victory',
'SoS': 'Strength of Schedule',
'SRS': 'Simple Rating System',
'OSRS': 'Offensive Simple Rating System',
'DSRS': 'Defensive Simple Rating System'
}
off_pass_dict = {'offPass Att': 'Passing Attempts',
'offPass Yds/Att': 'Yards Gained Per Passing Attempt',
'N Yds/Att': 'Net Yards Gained Per Passing Attempt',
'offPass Pass Yds': 'Passing Yards',
'offPass TD': 'Passing Touchdowns',
'offPass Rate': 'Passer Rating',
'offPass Sck': 'Sacks',
'offPass SckY': 'Yards Lost from Sacks'
}
off_comp_dict = {'offPass Cmp': 'Passing Completions',
'offPass Cmp %': 'Passing Completion Percentage',
'offPass 1st': '1st Downs Gained from Passing Completions',
'offPass 1st%': '1st Down % for Passing Completions',
'offPass 20+': 'Passing Completions >= 20 Yards',
'offPass 40+': 'Passing Completions >= 40 Yards',
'offPass Lng': 'Longest Passing Completion',
}
off_rush_dict = {'offRush Att': 'Rushing Attempts',
'offRush Rush Yds': 'Rushing Yards',
'offRush YPC': 'Yards Gained Per Carry',
'offRush TD': 'Rushing Touchdowns',
'offRush 20+': 'Rushing Completions >=20 Yards',
'offRush 40+': 'Rushing Completions >=40 Yards',
'offRush Lng': 'Longest Rushing Completion Play',
'offRush Rush 1st': '1st Downs Gained from Rushing Attempts',
'offRush Rush 1st%': '1st Down % for Rushing Attempts',
'offRush Rush FUM': 'Rushing Fumbles',
}
off_rec_dict = {'offRec Rec': 'Receptions',
'offRec Yds': 'Receiving Yards',
'offRec Yds/Rec': 'Receiving Yards Gained Per Reception',
'offRec TD': 'Receiving Touchdowns',
'offRec 20+': 'Receiving Completions >=20 Yards',
'offRec 40+': 'Receiving Completions >=40 Yards',
'offRec Lng': 'Longest Reception Play',
'offRec Rec 1st': '1st Downs Gained from Receptions',
'offRec Rec 1st%': '1st Downs % for Receptions',
'offRec Rec FUM': 'Receiving Fumbles',
}
off_scor_dict = {'offScor Tot TD': 'Total Touchdowns',
'offScor 2-PT': '2-Point Conversions'
}
off_down_dict = {'offDown 3rd Att': '3rd Down Attempts',
'offDown 3rd Md': '3rd Down Conversions',
'offDown 4th Att': '4th Down Attempts',
'offDown 4th Md': '4th Down Conversions',
'offDown Scrm Plys': 'Offensive Plays from Line of Scrimmage'
}
def_pass_dict = {'defPass Att': 'Passing Attempts Allowed',
'defPass Cmp': 'Passing Completions Allowed',
'defPass Cmp %': 'Passing Completion % by Opp. Offenses',
'defPass Yds': 'Passing Yards Allowed',
'defPass Yds/Att': 'Yards Allowed Per Passing Attempt',
'defPass TD': 'Passing Touchdowns Allowed',
'defPass INT': 'Interceptions Forced',
'defPass 1st': '1st Downs Allowed from Passing Completions',
'defPass 1st%': '1st Down % for Passing Completions by Opp. Offenses',
'defPass Sck': 'Sacks Forced'
}
def_rush_dict = {'defRush Att': 'Rushing Attempts Allowed',
'defRush Rush Yds': 'Rushing Yards Allowed',
'defRush YPC': 'Yards Gained Per Carry by Opp. Offenses',
'defRush TD': 'Rushing Touchdowns Allowed',
'defRush Rush 1st': '1st Downs Allowed from Rushing Attempts',
'defRush Rush 1st%': '1st Down % for Rushing Attempts by Opp. Offenses'
}
def_scor_dict = {'defScor FR TD': 'Fumble Recoveries Touchdowns',
'defScor SFTY': 'Safeties Forced',
'defScor INT TD': 'Interception Touchdowns'
}
def_down_dict = {'defDown 3rd Att': '3rd Down Attempts Allowed',
'defDown 3rd Md': '3rd Down Conversions Allowed',
'defDown 4th Att': '4th Down Attempts Allowed',
'defDown 4th Md': '4th Down Conversions Allowed',
'defDown Scrm Plys': 'Defensive Plays from Line of Scrimmage'
}
def_int_dict = {'defInt INT Yds': 'Interception Return Yardage',
'defInt Lng': 'Longest Interception Play',
'Turn Marg': 'Turnover Margin'
}
Next, we'll make the helper function, plot_stats for all the linear regression plots. This helper will:
stat_dict), the number of rows and columns for the overall plot, the category being plotted, and the height and width of the final figurerows and colsstat_dict to add each subplot and its corresponding regression line to the overall figureheight and width of the final figuredef plot_stats(stat_dict, rows, cols, category, height, width):
subtitles = []
for key in stat_dict.keys():
subtitles.append(f'{key} vs W-L%')
# Create subplots
fig = make_subplots(rows=rows, cols=cols, subplot_titles=subtitles, y_title='Win-Loss %')
fig.update_layout(title_text=f'Scatter Plot Distribution of {category} over Win-Loss Percentage')
row = 1
col = 1
# Go through each stat for the specified dictionary and add the subplots to the overall figure
for key, value in stat_dict.items():
# Plot the data, udpate x-axis labels
fig.add_trace(go.Scatter(
x=nfl_df[key],
y=nfl_df['W-L%'],
mode='markers',
# xaxis=standings_val,
# yaxis="Win/Loss Percentage",
name=value),
row=row, col=col)
fig.update_xaxes(title_text=value, row=row, col=col)
# Create linear regression model and line
model = smf.ols(f'Q("W-L%") ~ Q("{key}")', data=nfl_df).fit()
x_vals = np.linspace(nfl_df[key].min(), nfl_df[key].max(), 100)
y_vals = model.predict(pd.DataFrame({key: x_vals}))
fig.add_trace(go.Scatter(x=x_vals, y=y_vals, name='Regression Fit', line=go.scatter.Line(color='black')), row=row, col=col)
# Increment row and col to get ready for next subplot position
row = row + 1 if col == cols else row
col = 1 if col == cols else col + 1
# Display rsquared values
rsquared = str(model.rsquared)
print(f'{key} r-squared value: ' + rsquared)
# Remove the legend
for trace in fig['data']:
if trace['name']:
trace['showlegend'] = False
# Adjust figure width and height after each subplot is added
fig.update_layout(height=height, width=width)
return fig
Now, it's time to graph each stat against W-L%. Let's start with the standings data.
fig = plot_stats(standings_dict, 2, 3, 'Overall Standing Stats', 600, 1000)
fig.show()
PD r-squared value: 0.8307928619160826 MoV r-squared value: 0.8313496121253985 SoS r-squared value: 0.03805714184869102 SRS r-squared value: 0.7681091417197841 OSRS r-squared value: 0.5598900635898558 DSRS r-squared value: 0.3899552403777341
From the graph, it appears that nearly all these stats, with the exception of SoS, follow a linear relationship, as evidenced by their r-squared values. These stats are actually the some of the strongest correlated with W-L%, as we'll see in a moment, so clearly these stats do a good job in predicting regular season success for a team. SoS, meanwhile, seems to have almost nothing to do with W-L%, which is fair considering the methods for assessing it are flawed.
One thing to note is that the r-squared values won't be very large for most of these statistics, as it's hard to properly fit these various values with a simple regression line. However, it's evident that most of these stats have a clear linear relationship with W-L% with all the data clustering around the trendlines, and that's really all we're looking for here. As such, any stats that have an r-squared value less than 0.1 will be dropped at the end, as we need at least 10% of the variation in the data to be explained by the linear model to meaningfully talk about a linear relationship.
Next, let's look at offensive passing stats. offPass will be broken up into 2 plots, one of which are general passing stats, and the other which comprises of passing completion stats. The first stat we'll be looking at are the general passing stats.
fig = plot_stats(off_pass_dict, 3, 3, 'Offensive Passing Stats', 700, 1175)
fig.show()
offPass Att r-squared value: 0.004366137926650526 offPass Yds/Att r-squared value: 0.29621559021889976 N Yds/Att r-squared value: 0.37212513677430425 offPass Pass Yds r-squared value: 0.09311056256394501 offPass TD r-squared value: 0.27727308719684773 offPass Rate r-squared value: 0.39061653022348597 offPass Sck r-squared value: 0.22258690487869537 offPass SckY r-squared value: 0.22480963407187593
From the graph, we can see that the stats, with the exception of offPass Att and offPass Pass Yds, all follow a linear relationship, as evidenced by their r-squared values. offPass Rate seems to have the highest correlation with W-L% which makes sense, as this measures the QB's ability to throw the ball to the offense. Most people agree that quarterback is the most important position in the game, so it makes sense that the better a QB performs, the higher a team's win-loss percentage would be.
Now, let's look at the other half of the offPass stats, those being the completion stats.
fig = plot_stats(off_comp_dict, 3, 3, 'Offensive Passing Completion Stats', 700, 1175)
fig.show()
offPass Cmp r-squared value: 0.023447782103629078 offPass Cmp % r-squared value: 0.18769273252266583 offPass 1st r-squared value: 0.10785827463701492 offPass 1st% r-squared value: 0.33968359400803994 offPass 20+ r-squared value: 0.10182731860884042 offPass 40+ r-squared value: 0.09546390186057618 offPass Lng r-squared value: 0.007615171296141199
From the graph, we can see that these stats aren't as highly correlated with W-L%, as evidenced by their r-squared values. The exception to this is offPass 1st %, which adds up, as getting more first downs gives offenses more opportunities to score. This stat is also highly correlated with N Yds/Att, so it's a highly valuable stat that is worth considering for our ML model.
Overall, the lower correlation numbers do make some amount of sense. Football isn't a pass-only sport, it also features a lot of rushing plays, which is what we'll look at for our next plot.
fig = plot_stats(off_rush_dict, 4, 3, 'Offensive Rushing Stats', 800, 1350)
fig.show()
offRush Att r-squared value: 0.17935555180860496 offRush Rush Yds r-squared value: 0.11307438636327494 offRush YPC r-squared value: 0.013798571844611862 offRush TD r-squared value: 0.2266979522843413 offRush 20+ r-squared value: 0.04480478657713871 offRush 40+ r-squared value: 0.009598161796905313 offRush Lng r-squared value: 0.0006404931859593788 offRush Rush 1st r-squared value: 0.16253914592076268 offRush Rush 1st% r-squared value: 0.0667849066494638 offRush Rush FUM r-squared value: 0.03005225262769351
Just like with the offPass stats, we can see that these stats aren't highly correlated with W-L%, as evidenced by their r-squared values. The two stats that stick out in this regard are offRush TD and offRush YPC. The former intuitively makes sense, as the more touchdowns you get, the better chance you have at accruing more points and therefore winning a game. The latter stat, however, is far more interesting. What this tells us is that offenses with a better, more efficient running game have a higher chance of scoring on each drive, leading to a higher win-loss percentage.
For our next plot, let's look at the offensive receiving stats.
fig = plot_stats(off_rec_dict, 4, 3, 'Offensive Receiving Stats', 800, 1350)
fig.show()
offRec Rec r-squared value: 0.023457710540459864 offRec Yds r-squared value: 0.09314865858166044 offRec Yds/Rec r-squared value: 0.11544782093303574 offRec TD r-squared value: 0.27727308719684773 offRec 20+ r-squared value: 0.1025968453514966 offRec 40+ r-squared value: 0.09562158784274843 offRec Lng r-squared value: 0.007509187012124774 offRec Rec 1st r-squared value: 0.10779204544783028 offRec Rec 1st% r-squared value: 0.1899125840695126 offRec Rec FUM r-squared value: 3.519239598614998e-05
Just like with the last couple graphs, we can see that these stats aren't highly correlated with W-L%, as evidenced by their r-squared values. The stats that stick out in this regard are offRec TD, offRec Rec 1st%, and offRec Yds/Rec. offRec TD is intuitive for reasons similar to offRush TD, so we won't discuss it. offRec Yds/Rec is in a similar situation to offRush YPC, although interestingly has a larger number of yards associated with it, which can likely be explained by the fact that rushing yards start behind the line of scrimmage while receiving yards start from wherever the QB has thrown the ball from. However, this leads to more 1st downs being recorded, which is what the last stat focuses on. This can also be explained similarly to the offPass 1st% stat.
Now, onto the offensive scoring stats.
fig = plot_stats(off_scor_dict, 1, 2, 'Offensive Scoring Stats', 350, 750)
fig.show()
offScor Tot TD r-squared value: 0.48640672416795994 offScor 2-PT r-squared value: 0.00129017616375382
It's evident to see that offScor 2-PT is not at all correlated with W-L%, which is honestly a little surprising. I would've thought that netting 8 points from a touchdown, as opposed to 6 or 7, would increase the chance of a team winning. However, teams usually only go for 2 when up by a lot or towards the end of a game when they're faced with a dire situation, so in this context it makes more sense. It's no suprise that tot TD are a good measurement of success for teams, and we've already discussed that at length.
Let's a look at offensive down stats.
fig = plot_stats(off_down_dict, 2, 3, 'Offensive Down Stats', 600, 1000)
fig.show()
offDown 3rd Att r-squared value: 0.030280839799291304 offDown 3rd Md r-squared value: 0.1666561455342831 offDown 4th Att r-squared value: 0.15693334079780907 offDown 4th Md r-squared value: 0.02387728516010723 offDown Scrm Plys r-squared value: 0.06316357345557211
These stats are perhaps one of the more interesting ones we've seen thus far. offDown 3rd Md and offDown 4th Att are the only stats with significant correlations with W-L%. offDown 4th Att being negatively related to W-L% makes sense since if you haven't converted to a 1st down already, your offense probably isn't as good as it should be, and so therefore the other team will most likely win. On the contrary, converting many times on 3rd down gives your a team a good opportunity to score.
That's it for the offensive stats. Let's move on to the defensive ones.
fig = plot_stats(def_pass_dict, 4, 3, 'Defensive Passing Stats', 800, 1350)
fig.show()
defPass Att r-squared value: 0.12597871527669813 defPass Cmp r-squared value: 0.00922320513489483 defPass Cmp % r-squared value: 0.09094992799642121 defPass Yds r-squared value: 0.012165727634122603 defPass Yds/Att r-squared value: 0.22459438116239416 defPass TD r-squared value: 0.09983679094929732 defPass INT r-squared value: 0.18509585304933507 defPass 1st r-squared value: 0.0029879310671180326 defPass 1st% r-squared value: 0.1851283661674482 defPass Sck r-squared value: 0.16512819854091132
The four stats that have significant correlations are defPass Att, defPass INT, defPass 1st%, and defPass Sck. Up until this point, we were only focusing on the offensive side of things, so it's fitting that defPass INT and defPass Sck both measure how good a team's defense is. If the defense is able to force more turnovers, giving their offense a chance to score, it stands to reason that this would be a good measure for a team's win-loss percentage. defPass 1st% measures how good a defense is, and a defense that constantly allows opposing offenses to convert is likely not a team that will win very many games. defPass Att is kind of an anomaly here, as the amount of attempts that an offense attempts shouldn't really mean much in terms of win-loss percentage.
Let's now take a look at some defensive rushing stats.
fig = plot_stats(def_rush_dict, 2, 3, 'Defensive Rushing Stats', 600, 1200)
fig.show()
defRush Att r-squared value: 0.41510439665542886 defRush Rush Yds r-squared value: 0.2299101620227917 defRush YPC r-squared value: 0.013862334660252995 defRush TD r-squared value: 0.19519711149220975 defRush Rush 1st r-squared value: 0.18435716501208166 defRush Rush 1st% r-squared value: 0.008380223480684457
Interestingly, this is the first graph in a while where all the stats are significantly correlated to W-L%. Intuitively, this makes sense, as the more times an opposing offense is able to rush on a defense, the weaker the defense is for the team facing against the opposing team, and a weaker defense leads to more points being scored on that team.
Now let's look at defensive scoring statistics.
fig = plot_stats(def_scor_dict, 1, 3, 'Defensive Scoring Stats', 350, 850)
fig.show()
defScor FR TD r-squared value: 0.008897507551636097 defScor SFTY r-squared value: 0.0016563020235650372 defScor INT TD r-squared value: 0.05320887229166116
As we can see from the graph, none of these stats have any significant linear relationships with W-L%. Part of the reason for this is that defensive scoring in general is quite rare, so there isn't enough data to draw from. There's also not that many of these plays that occur during a season, so most of the values are clustered between 0 and 2.
Hopefully the defensive downs stats will be more interesting.
fig = plot_stats(def_down_dict, 2, 3, 'Defensive Down Stats', 600, 1000)
fig.show()
defDown 3rd Att r-squared value: 0.003775937514040928 defDown 3rd Md r-squared value: 0.0881996157706636 defDown 4th Att r-squared value: 0.1884079241011879 defDown 4th Md r-squared value: 0.04411745341106055 defDown Scrm Plys r-squared value: 0.04579900369944234
It turns out, not really. defDown 4th Att is the only significant stat here, and the reasoning is similar to the one given for offDown 4th Att, just in the other direction since we're dealing with defenses and not offenses.
Finally, let's have a look at defensive interception stats.
fig = plot_stats(def_int_dict, 1, 3, 'Defensive Interception Stats', 350, 750)
fig.show()
defInt INT Yds r-squared value: 0.0801502244788026 defInt Lng r-squared value: 0.018617041874587348 Turn Marg r-squared value: 0.27628358447365897
Once again, there's not much to show other than Turn Marg. A team's overall metric for turning the ball over is highly important for determining how well that team will fair in the regular season, since less turnovers means more chances for the team's offense to score. There are a few outliers which skews the regression line a bit, but overall, turnover margin and win-loss percentage is linearly related.
Now that we've looked at all the stats and determined which ones have the strongest linear correlations, we can drop the ones that don't and store this into nfl_df_trim.
# Drop any columns with r-squared val < 0.1
nfl_df_trim = nfl_df.drop(
columns={'offPass Pass Yds',
'offPass Cmp',
'offPass 40+',
'offPass Lng',
'offRush 20+',
'offRush 40+',
'offRush Lng',
'offRush Rush 1st%',
'offRush Rush FUM',
'offRec Yds',
'offRec 40+',
'offRec Lng',
'offRec Rec FUM',
'offScor 2-PT',
'offDown 3rd Att',
'offDown 4th Md',
'offDown Scrm Plys',
'defPass Cmp',
'defPass Cmp %',
'defPass Yds',
'defPass TD',
'defPass 1st',
'defRush YPC',
'defRush Rush 1st%',
'defScor FR TD',
'defScor SFTY',
'defScor INT TD',
'defDown 3rd Att',
'defDown 3rd Md',
'defDown 4th Md',
'defDown Scrm Plys',
'defInt INT Yds',
'defInt Lng'
})
nfl_df_trim
| Team | W-L% | PD | MoV | SoS | SRS | OSRS | DSRS | Conference | Made Playoffs | ... | offRec Yds/Rec | offRec TD | offRec 20+ | offRec Rec 1st | offRec Rec 1st% | Won SB | Lost SB | Made SB | N Yds/Att | Turn Marg | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2002 New York Jets | 0.563 | 23 | 1.4 | 1.7 | 3.2 | 0.9 | 2.3 | AFC | True | ... | 11.0 | 25 | 44 | 190 | 57.8 | False | False | False | 6.61 | -2 |
| 1 | 2002 New England Patriots | 0.563 | 35 | 2.2 | 1.8 | 4.0 | 2.1 | 1.9 | AFC | False | ... | 10.1 | 28 | 37 | 184 | 49.2 | False | False | False | 5.62 | -7 |
| 2 | 2002 Miami Dolphins | 0.563 | 77 | 4.8 | 1.2 | 6.1 | 1.7 | 4.4 | AFC | False | ... | 11.3 | 18 | 38 | 155 | 57.2 | False | False | False | 6.02 | -10 |
| 3 | 2002 Buffalo Bills | 0.500 | -18 | -1.1 | 0.9 | -0.3 | 2.1 | -2.3 | AFC | False | ... | 11.6 | 24 | 45 | 218 | 57.8 | False | False | False | 5.99 | -20 |
| 4 | 2002 Pittsburgh Steelers | 0.656 | 45 | 2.8 | -0.1 | 2.7 | 3.1 | -0.4 | AFC | True | ... | 11.5 | 26 | 51 | 199 | 56.9 | False | False | False | 6.55 | -15 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 667 | 2022 Atlanta Falcons | 0.412 | -21 | -1.2 | -0.9 | -2.1 | -0.1 | -2.0 | NFC | False | ... | 11.4 | 17 | 37 | 148 | 57.6 | False | False | False | 5.97 | -3 |
| 668 | 2022 San Francisco 49ers | 0.765 | 173 | 10.2 | -2.3 | 7.9 | 3.3 | 4.6 | NFC | True | ... | 12.0 | 30 | 56 | 188 | 55.6 | False | False | False | 7.10 | 12 |
| 669 | 2022 Seattle Seahawks | 0.529 | 6 | 0.4 | -0.8 | -0.5 | 1.9 | -2.4 | NFC | True | ... | 10.7 | 30 | 50 | 206 | 51.6 | False | False | False | 6.35 | 6 |
| 670 | 2022 Los Angeles Rams | 0.294 | -77 | -4.5 | 0.5 | -4.0 | -4.1 | 0.0 | NFC | False | ... | 10.0 | 16 | 37 | 180 | 52.0 | False | False | False | 5.26 | 1 |
| 671 | 2022 Arizona Cardinals | 0.235 | -109 | -6.4 | 0.2 | -6.2 | -1.9 | -4.3 | NFC | False | ... | 9.2 | 17 | 40 | 189 | 43.6 | False | False | False | 5.10 | -9 |
672 rows × 53 columns
sb_df and sb_mean¶To best figure out which teams will have any sort of playoff success or any chance at the Super Bowl, we will find the average values for each stat and use these as a metric to filter out the teams with little to no chance at the playoffs.
Before that however, let's store this playoff-specific data into a new dataframe called sb_df, which will hold data on all teams that punched their ticket to the postseason in the last 21 years.
sb_df = nfl_df_trim.copy(deep=True)
sb_df = sb_df[sb_df['Made Playoffs']]
sb_df
| Team | W-L% | PD | MoV | SoS | SRS | OSRS | DSRS | Conference | Made Playoffs | ... | offRec Yds/Rec | offRec TD | offRec 20+ | offRec Rec 1st | offRec Rec 1st% | Won SB | Lost SB | Made SB | N Yds/Att | Turn Marg | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2002 New York Jets | 0.563 | 23 | 1.4 | 1.7 | 3.2 | 0.9 | 2.3 | AFC | True | ... | 11.0 | 25 | 44 | 190 | 57.8 | False | False | False | 6.61 | -2 |
| 4 | 2002 Pittsburgh Steelers | 0.656 | 45 | 2.8 | -0.1 | 2.7 | 3.1 | -0.4 | AFC | True | ... | 11.5 | 26 | 51 | 199 | 56.9 | False | False | False | 6.55 | -15 |
| 5 | 2002 Cleveland Browns | 0.563 | 24 | 1.5 | -0.3 | 1.2 | -0.4 | 1.7 | AFC | True | ... | 10.8 | 27 | 47 | 171 | 50.6 | False | False | False | 5.81 | -12 |
| 8 | 2002 Tennessee Titans | 0.688 | 43 | 2.7 | -0.9 | 1.8 | 1.6 | 0.1 | AFC | True | ... | 11.2 | 22 | 35 | 182 | 59.5 | False | False | False | 6.37 | -4 |
| 9 | 2002 Indianapolis Colts | 0.625 | 36 | 2.3 | -1.1 | 1.2 | 0.4 | 0.7 | AFC | True | ... | 10.7 | 27 | 51 | 213 | 54.3 | False | False | False | 6.60 | -21 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 658 | 2022 New York Giants | 0.559 | -6 | -0.4 | 0.0 | -0.4 | -0.8 | 0.4 | NFC | True | ... | 9.9 | 17 | 28 | 170 | 49.1 | False | False | False | 5.54 | 6 |
| 660 | 2022 Minnesota Vikings | 0.765 | -3 | -0.2 | 0.1 | -0.1 | 2.8 | -2.9 | NFC | True | ... | 10.8 | 30 | 49 | 244 | 54.5 | False | False | False | 6.23 | 3 |
| 664 | 2022 Tampa Bay Buccaneers | 0.471 | -45 | -2.6 | 0.4 | -2.3 | -3.3 | 1.1 | NFC | True | ... | 9.5 | 26 | 49 | 240 | 48.1 | False | False | False | 5.93 | 2 |
| 668 | 2022 San Francisco 49ers | 0.765 | 173 | 10.2 | -2.3 | 7.9 | 3.3 | 4.6 | NFC | True | ... | 12.0 | 30 | 56 | 188 | 55.6 | False | False | False | 7.10 | 12 |
| 669 | 2022 Seattle Seahawks | 0.529 | 6 | 0.4 | -0.8 | -0.5 | 1.9 | -2.4 | NFC | True | ... | 10.7 | 30 | 50 | 206 | 51.6 | False | False | False | 6.35 | 6 |
258 rows × 53 columns
Let's now take the average of all the numerical stats to get an idea of what a Super Bowl-bound team looks like. This will help us to narrow our criteria down on which teams have a chance at hoisting the Lombardi trophy. We'll store this in sb_mean.
# sb_mean = sb_df.loc[:, ~sb_df.columns.isin(['Index', 'Year', 'Team', 'Conference', 'Won SB', 'Made SB'])]
# sb_mean = (sb_mean.mean()).apply(lambda x: math.trunc(100 * x) / 100)
# sb_mean
W-L% 0.76 PD 130.69 MoV 8.13 SoS -0.18 SRS 7.94 OSRS 5.20 DSRS 2.74 Made Playoffs 1.00 offPass Att 549.30 offPass Cmp % 64.04 offPass Yds/Att 7.77 offPass TD 30.57 offPass INT 11.78 offPass Rate 97.27 offPass 1st 206.71 offPass 1st% 37.55 offPass 20+ 55.80 offPass Sck 31.64 offPass SckY 206.64 defPass Att 567.95 defPass Yds/Att 6.10 defPass INT 17.88 defPass 1st% 32.54 defPass Sck 42.78 offRush Att 453.73 offRush Rush Yds 1910.61 offRush YPC 4.19 offRush TD 16.40 offRush Rush 1st 107.83 defRush Att 408.21 defRush Rush Yds 1680.57 defRush TD 10.69 defRush Rush 1st 92.38 offScor Tot TD 50.14 offDown 3rd Md 86.66 offDown 4th Att 13.14 defDown 4th Att 19.90 defFumb FF 16.40 defFumb FR 4.52 defFumb FR TD 0.42 offRec Rec 352.64 offRec Yds/Rec 12.15 offRec TD 30.57 offRec 20+ 55.80 offRec Rec 1st 206.69 offRec Rec 1st% 58.61 Lost SB 0.50 N Yds/Att 6.98 Turn Marg 2.66 dtype: float64
Let's use sb_df to look at some of the stats surrounding teams that make the playoffs.
First, let's see the efficiency of each team that made the postseason. We can calculate the efficiency by plotting DSRS against OSRS. We''ll differentiate between Super Bowl winners and the rest. This will give us an idea as to how efficient each category was during the regular season.
plot = px.scatter(sb_df, x='OSRS', y='DSRS',
color='Won SB',
# color_discrete_sequence=px.colors.qualitative.Pastel,
trendline='ols',
category_orders={'Won SB': [True, False]},
labels={
'OSRS': 'Offensive Simple Rating System',
'DSRS': 'Defensive Simple Rating System',
},
title='Regular Season Efficiency (DSRS vs OSRS) of each Team that made the Playoffs')
plot.show()
From the graph, we can see that the teams who won the Super Bowl generally have a higher efficiency than teams who didn't. In general, the higher your DSRS is for the regular season, the better your chances are at winning the Super Bowl. In general, teams with high OSRS don't perform as well during the Super Bowl, and this makes sense. There's a well-known saying in the NFL that defense wins championships, and it appears the data agrees with that sentiment.
Now that we've looked at both types of postseason teams, let's take a moment to focus on just Super Bowl winners. What regular-season statistics make these teams perform so well? Let's start by reanalyzing efficiency (DSRS vs OSRS) in a density map. The rectangles correspond to a range for each axis, and the number indicates how many teams meet both of these criteria. For instance, 1 Super Bowl-winning team had a DSRS between 9 and 10.9, and an OSRS between -2 and -0.1. Before we do this, we'll cast the Won SB and Lost SB variables as integers so that it works better in our heatmaps.
sb_df['Lost SB'] = sb_df['Lost SB'].astype(int)
sb_df['Won SB'] = sb_df['Won SB'].astype(int)
sb_df.head()
| Team | W-L% | PD | MoV | SoS | SRS | OSRS | DSRS | Conference | Made Playoffs | ... | offRec Yds/Rec | offRec TD | offRec 20+ | offRec Rec 1st | offRec Rec 1st% | Won SB | Lost SB | Made SB | N Yds/Att | Turn Marg | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2002 New York Jets | 0.563 | 23 | 1.4 | 1.7 | 3.2 | 0.9 | 2.3 | AFC | True | ... | 11.0 | 25 | 44 | 190 | 57.8 | 0 | 0 | False | 6.61 | -2 |
| 4 | 2002 Pittsburgh Steelers | 0.656 | 45 | 2.8 | -0.1 | 2.7 | 3.1 | -0.4 | AFC | True | ... | 11.5 | 26 | 51 | 199 | 56.9 | 0 | 0 | False | 6.55 | -15 |
| 5 | 2002 Cleveland Browns | 0.563 | 24 | 1.5 | -0.3 | 1.2 | -0.4 | 1.7 | AFC | True | ... | 10.8 | 27 | 47 | 171 | 50.6 | 0 | 0 | False | 5.81 | -12 |
| 8 | 2002 Tennessee Titans | 0.688 | 43 | 2.7 | -0.9 | 1.8 | 1.6 | 0.1 | AFC | True | ... | 11.2 | 22 | 35 | 182 | 59.5 | 0 | 0 | False | 6.37 | -4 |
| 9 | 2002 Indianapolis Colts | 0.625 | 36 | 2.3 | -1.1 | 1.2 | 0.4 | 0.7 | AFC | True | ... | 10.7 | 27 | 51 | 213 | 54.3 | 0 | 0 | False | 6.60 | -21 |
5 rows × 53 columns
fig = px.density_heatmap(sb_df,
x='OSRS',
y='DSRS',
z='Won SB',
histfunc='sum',
text_auto=True,
labels={
'OSRS': 'Offensive Simple Rating System',
'DSRS': 'Defensive Simple Rating System',
'Won SB': 'Super Bowl Winners'
},
title='Density Map of Defensive vs Offensive Simple Rating System by Super Bowl Winner')
fig.show()
From this density map, it's clear to see that nearly every Super Bowl-winning team from 2002 onwards has had a non-negative DSRS, emphasizing the need for defense. However, all but one team have had non-negative OSRSs, highlighting that stat's importance as well. This goes to show that any team that wants a Lombardi must have a good offense and defense, which seems obvious.
However, as stated before, DSRS seems to be an important factor here, as the higher up the y-axis you go, the more Super Bowl-winning teams you see. Conversely, a higher OSRS doesn't necessarily translate to a Super Bowl-winning team's success.
Let's compare the rest of the playoff teams to the Super Bowl-winning teams, looking at the exact same graph.
sb_df['Lost Pst'] = sb_df['Won SB'].apply(lambda x: not x)
sb_df['Lost Pst'] = sb_df['Lost Pst'].astype(int)
fig = px.density_heatmap(sb_df,
x='OSRS',
y='DSRS',
z='Lost Pst',
histfunc='sum',
text_auto=True,
labels={
'OSRS': 'Offensive Simple Rating System',
'DSRS': 'Defensive Simple Rating System',
'Lost Pst': 'Postseason Losers'
},
title='Density Map of Defensive vs Offensive Simple Rating System by Postseason Losers')
fig.show()
From this density map, it appears that most postseason teams are concentrated between 0 and 5 DSRS and OSRS. There's also more of an emphasis on OSRS as opposed to DSRS, which, as we mentioned for Super Bowl-winning teams, doesn't necessarily translate to a Super Bowl-winning team's success. This shows why defense is king, as clearly all these playoff teams didn't stack up in the postseason. It also appears that Super Bowl-winning teams generally have better offenses than the rest of these playoff teams, which again, stresses the importance of being a well-balanced team.
These next few graphs will be visualizing the stats of only the Super Bowl winners. Remember, this is our end goal, not just looking at playoff teams.
Now let's take a look at a Super Bowl-winning team's Turn Marg over their MoV. I decided to combine two stats that looked at overall margins (both offense- and defense-related) and plot them to see if there were any similarities.
fig = px.density_heatmap(sb_df,
x='MoV',
y='Turn Marg',
z='Won SB',
histfunc='sum',
text_auto='.0f',
nbinsy=9,
labels={
'MoV': 'Margin of Victory',
'Turn Marg': 'Turnover Margin',
'Won SB': 'Super Bowl Winners'
},
title='Density Map of Turnover Margin vs Margin of Victory by Super Bowl Winner')
fig.show()
From the graph, it appears that nearly all Super Bowl-winning teams have both a non-negative turnover margin and margin of victory. This means, in general, that Super Bowl-winning teams win by a lot of points during the regular season. It also means that their defenses force a lot of turnovers and/or their offenses are highly efficient. This furthers the narrative that Super Bowl-winning teams are well-rounded teams, with superb offense and defense at their disposal.
Let's see how SRS stacks up against W-L%.
# fig = px.density_heatmap(sb_df, x='MoV', y='Turn Marg', z='W-L%', histfunc='avg', text_auto='.2f')
fig = px.density_heatmap(sb_df,
x='W-L%',
y='SRS',
z='Won SB',
histfunc='sum',
text_auto='.0f',
nbinsx=9,
labels={
'W-L%': 'Win Loss Percentage',
'SRS': 'Simple Rating System',
'Won SB': 'Super Bowl Winners'
},
title='Density Map of Simple Rating System vs Win Loss Percentage by Super Bowl Winner')
fig.show()
Something interesting to note is that every Super Bowl-winning team since 2002 has had an above average SRS (> 0) and a winning record above .500. The sweet spot appears to be an SRS between 5 and 10 and a win-loss percentage between 70 and 80%. In general, most Super Bowl winners have an SRS value above 5.
Our final visualization will be looking at the average number of receiving and rushing yards that Super Bowl winners produce during the regular season.
fig = px.density_heatmap(sb_df,
x='offRush YPC',
y='offRec Yds/Rec',
z='Won SB',
histfunc='sum',
text_auto='.0f',
labels={
'offRush YPC': 'Average Yards Per Carry',
'offRec Yds/Rec': 'Average Receiving Yards by Reception',
'Won SB': 'Super Bowl Winners'
},
title='Density Map of Average Yards Per Carry vs Average Receiving Yards by Reception by Super Bowl Winner')
fig.show()
From the graph, we can see that most Super Bowl winners averaged a YPC above 3.8 and a Yds/Rec above 11, which actually isn't that far off from the average of all NFL teams. The mean for YPC is around 4.2 and the mean for Yds/Rec is around 11.4 (as we can calculate from the data), so actually the averages are higher than that of many Super Bowl-winning teams. Again, this emphasizes the need for defense, and shows that an average offense with decent defense can make a serious run for the Lombardi.